Gridded Data¶
hvPlot provides one API to explore data of many different types. Previous sections have exclusively worked with tabular data stored in pandas (or pandas-like) DataFrames. The other most common type of data are n-dimensional arrays. hvPlot aims to eventually support different array libraries but for now focuses on xarray. XArray provides a convenient and very powerful wrapper to label the axis and coordinates of multi-dimensional (n-D) arrays. This user guide will cover how to leverage xarray and hvplot to visualize and explore data of different dimensionality ranging from simple 1D data, to 2D image-like data, to multi-dimensional cubes of data.
For these examples we’ll use the North American air temperature dataset:
import xarray as xr
import hvplot.xarray # noqa
air_ds = xr.tutorial.open_dataset('air_temperature').load()
air = air_ds.air
air_ds
<xarray.Dataset>
Dimensions: (lat: 25, lon: 53, time: 2920)
Coordinates:
* lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
* lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
* time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
air (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...- lat: 25
- lon: 53
- time: 2920
- lat(lat)float3275.0 72.5 70.0 ... 20.0 17.5 15.0
- standard_name :
- latitude
- long_name :
- Latitude
- units :
- degrees_north
- axis :
- Y
array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5, 45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5, 15. ], dtype=float32) - lon(lon)float32200.0 202.5 205.0 ... 327.5 330.0
- standard_name :
- longitude
- long_name :
- Longitude
- units :
- degrees_east
- axis :
- X
array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5, 225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5, 250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5, 275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5, 300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5, 325. , 327.5, 330. ], dtype=float32) - time(time)datetime64[ns]2013-01-01 ... 2014-12-31T18:00:00
- standard_name :
- time
- long_name :
- Time
array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000', '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000', '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'], dtype='datetime64[ns]')
- air(time, lat, lon)float32241.2 242.5 243.5 ... 296.2 295.7
- long_name :
- 4xDaily Air temperature at sigma level 995
- units :
- degK
- precision :
- 2
- GRIB_id :
- 11
- GRIB_name :
- TMP
- var_desc :
- Air temperature
- dataset :
- NMC Reanalysis
- level_desc :
- Surface
- statistic :
- Individual Obs
- parent_stat :
- Other
- actual_range :
- [185.16 322.1 ]
array([[[241.2 , 242.5 , 243.5 , ..., 232.79999, 235.5 , 238.59999], [243.79999, 244.5 , 244.7 , ..., 232.79999, 235.29999, 239.29999], [250. , 249.79999, 248.89 , ..., 233.2 , 236.39 , 241.7 ], ..., [296.6 , 296.19998, 296.4 , ..., 295.4 , 295.1 , 294.69998], [295.9 , 296.19998, 296.79 , ..., 295.9 , 295.9 , 295.19998], [296.29 , 296.79 , 297.1 , ..., 296.9 , 296.79 , 296.6 ]], [[242.09999, 242.7 , 243.09999, ..., 232. , 233.59999, 235.79999], [243.59999, 244.09999, 244.2 , ..., 231. , 232.5 , 235.7 ], [253.2 , 252.89 , 252.09999, ..., 230.79999, 233.39 , 238.5 ], ... [293.69 , 293.88998, 295.38998, ..., 295.09 , 294.69 , 294.29 ], [296.29 , 297.19 , 297.59 , ..., 295.29 , 295.09 , 294.38998], [297.79 , 298.38998, 298.49 , ..., 295.69 , 295.49 , 295.19 ]], [[245.09 , 244.29 , 243.29 , ..., 241.68999, 241.48999, 241.79 ], [249.89 , 249.29 , 248.39 , ..., 239.59 , 240.29 , 241.68999], [262.99 , 262.19 , 261.38998, ..., 239.89 , 242.59 , 246.29 ], ..., [293.79 , 293.69 , 295.09 , ..., 295.29 , 295.09 , 294.69 ], [296.09 , 296.88998, 297.19 , ..., 295.69 , 295.69 , 295.19 ], [297.69 , 298.09 , 298.09 , ..., 296.49 , 296.19 , 295.69 ]]], dtype=float32)
- Conventions :
- COARDS
- title :
- 4x daily NMC reanalysis (1948)
- description :
- Data is from NMC initialized reanalysis (4x/day). These are the 0.9950 sigma level values.
- platform :
- Model
- references :
- http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html
1D Plots¶
Selecting the data at a particular lat/lon coordinate we get a 1D dataset of air temperatures over time:
air1d = air.sel(lat=40, lon=285)
air1d.hvplot()
Notice how the axes are already appropriately labeled, because xarray stores the metadata required. We can also further subselect the data and use * to overlay plots:
air1d_sel = air1d.sel(time='2013-01')
air1d_sel.hvplot(color='purple') * air1d_sel.hvplot.scatter(marker='o', color='blue', size=15)
air.lat
<xarray.DataArray 'lat' (lat: 25)>
array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,
45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,
15. ], dtype=float32)
Coordinates:
* lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
Attributes:
standard_name: latitude
long_name: Latitude
units: degrees_north
axis: Y- lat: 25
- 75.0 72.5 70.0 67.5 65.0 62.5 60.0 ... 27.5 25.0 22.5 20.0 17.5 15.0
array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5, 45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5, 15. ], dtype=float32) - lat(lat)float3275.0 72.5 70.0 ... 20.0 17.5 15.0
- standard_name :
- latitude
- long_name :
- Latitude
- units :
- degrees_north
- axis :
- Y
array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5, 45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5, 15. ], dtype=float32)
- standard_name :
- latitude
- long_name :
- Latitude
- units :
- degrees_north
- axis :
- Y
Selecting multiple¶
If we select multiple coordinates along one axis and plot a chart type, the data will automatically be split by the coordinate:
air.sel(lat=[20, 40, 60], lon=285).hvplot.line()
To plot a different relationship we can explicitly request to display the latitude along the y-axis and use the by keyword to color each longitude (or ‘lon’) differently (note that this differs from the hue keyword xarray uses):
air.sel(time='2013-02-01 00:00', lon=[280, 285]).hvplot.line(y='lat', by='lon', legend='top_right')
2D Plots¶
By default the DataArray.hvplot() method generates an image if the data is two-dimensional.
air2d = air.sel(time='2013-06-01 12:00')
air2d.hvplot(width=400)
Alternatively we can also plot the same data using the contour and contourf methods, which provide a levels argument to control the number of iso-contours to draw:
air2d.hvplot.contour(width=400, levels=20) + air2d.hvplot.contourf(width=400, levels=8)
n-D Plots¶
If the data has more than two dimensions it will default to a histogram without providing it further hints:
air.hvplot()
However we can tell it to apply a groupby along a particular dimension, allowing us to explore the data as images along that dimension with a slider:
air.hvplot(groupby='time', width=500)
By default, for numeric types you’ll get a slider and for non-numeric types you’ll get a selector. Use widget_type and widget_location to control the look of the widget. To learn more about customizing widget behavior see Widgets.
air.hvplot(groupby='time', width=600, widget_type='scrubber', widget_location='bottom')
If we pick a different, lower dimensional plot type (such as a ‘line’) it will automatically apply a groupby over the remaining dimensions:
air.hvplot.line(width=600)
Statistical plots¶
Statistical plots such as histograms, kernel-density estimates, or violin and box-whisker plots aggregate the data across one or more of the coordinate dimensions. For instance, plotting a KDE provides a summary of all the air temperature values but we can, once again, use the by keyword to view each selected latitude (or ‘lat’) separately:
air.sel(lat=[25, 50, 75]).hvplot.kde('air', by='lat', alpha=0.5)
Using the by keyword we can break down the distribution of the air temperature across one or more variables:
air.hvplot.violin('air', by='lat', color='lat', cmap='Category20')
Datashading¶
If you are plotting a large amount of data at once, you can consider using the hvPlot interface to Datashader, which can be enabled simply by setting datashade=True.
Note that be declaring that the data should not be grouped by another coordinate variable, by setting groupby=[], we can plot all the datapoints, showing us the spread of air temperatures in the dataset:
air.hvplot.scatter('time', groupby=[], datashade=True) *\
air.mean(['lat', 'lon']).hvplot.line('time', color='indianred')
Here we also overlaid a non-datashaded line plot of the average temperature at each time. If you enable the appropriate hover tool, the overlaid data supports hovering and zooming even in a static export such as on a web server or in an email, while the raw-data plot has been aggregated spatially before it is sent to the browser, and thus it has only the fixed spatial binning available at that time. If you have a live Python process, the raw data will be aggregated each time you pan or zoom, letting you see the entire dataset regardless of size.